Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hotfixing urllib2 + bwb API #2779

Merged
merged 1 commit into from
Dec 27, 2019
Merged

hotfixing urllib2 + bwb API #2779

merged 1 commit into from
Dec 27, 2019

Conversation

mekarpeles
Copy link
Member

@mekarpeles mekarpeles commented Dec 27, 2019

A regression either on OL or BWB's side (relating to urllib2) is causing BWB API calls to fail. This PR is a P0 hotfix which switches urllib2 to use requests in core/vendors.py.

Repro

Minimum repro case:

import urllib2
BWB_API = 'https://products.betterworldbooks.com/service.aspx?ItemId='  # or "http"
req = urllib2.Request(BWB_API + '9780375806704')
f = urllib2.urlopen(req)

results in:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Test

https://dev.openlibrary.org/books/OL24658622M/Measuring_the_world has BWB price whereas https://openlibrary.org/books/OL24658622M/Measuring_the_world does not

Stakeholders

cc @bfalling, @cdrini

@mekarpeles mekarpeles added Affects: Partners Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed] Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] labels Dec 27, 2019
@mekarpeles mekarpeles self-assigned this Dec 27, 2019
@mekarpeles
Copy link
Member Author

Requesting a retro-CR from either @bfalling or @cdrini. Deploying to fix on prod.

@mekarpeles mekarpeles merged commit 4ba4afd into master Dec 27, 2019
@bfalling
Copy link
Collaborator

LGTM

Copy link
Contributor

@tfmorris tfmorris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this has already been merged, I think we should circle back and clean it up so that it's more consistent.

@@ -2,6 +2,7 @@
import web
import urllib2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed and the exception handling for the referenced urllib2.HTTPError reviewed and updated as necessary.

@@ -2,6 +2,7 @@
import web
import urllib2
import simplejson
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suspicious of the simplejson reference. Does the API really return data XML encoded and errors JSON encoded?

If the data is actually returned as XML, why are we "parsing" it with regexs instead of an XML parser?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mekarpeles I've been finding urllib2 references cause problems and can be replaced completely by requests (as it looks like you have done below.

The simplejson import is generally required with urllib2 to parse the response, but that is not needed with requests as it has a built in json() -- I think replacing urllib2 + simplejson with requests everywhere in the code is something we should aim for, it helps with Python3 and just makes network requests so much easier. Not sure it warrants a single push effort, but I think making it known that it's a common goal may help refactoring + moving forward with adding new features.

The two imports on this file and refactor should be removed.

f = urllib2.urlopen(req)
response = f.read()
f.close()
r = requests.get(url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Requests doesn't throw an exception like urllib2 does, so it needs explicit status checking.

I've taken a crack at repairing it in #2894, but this was flagged as an issue over 3 weeks ago, so I would have hoped to not have to take it on myself (as part of the Python 3 work).

@cdrini cdrini deleted the hotfix/bwb-urllib2 branch May 6, 2021 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Partners Lead: @mekarpeles Issues overseen by Mek (Staff: Program Lead) [managed] Priority: 0 Fix now: Issue prevents users from using the site or active data corruption. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants